Information Extraction from Text Corpora: Using Filters on Collocation Sets

نویسندگان

  • Gerhard Heyer
  • Uwe Quasthoff
  • Christian Wolff
چکیده

This paper describes the application of filtering techniques to collocation sets calculated for very large text corpora. Additional information like patterns, grammatical information, subject areas and numerical values associated with the collocations are used to identify collocations with given semantic structure. Various examples and different techniques for applying such filters are described. We also give several examples of practical applications for this type of information extraction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inducing Discourse Connectives from Parallel Texts

Discourse connectives (e.g. however, because) are terms that explicitly express discourse relations in a coherent text. While a list of discourse connectives is useful for both theoretical and empirical research on discourse relations, few languages currently possess such a resource. In this article, we propose a new method that exploits parallel corpora and collocation extraction techniques to...

متن کامل

Accurate Collocation Extraction Using a Multilingual Parser

This paper focuses on the use of advanced techniques of text analysis as support for collocation extraction. A hybrid system is presented that combines statistical methods and multilingual parsing for detecting accurate collocational information from English, French, Spanish and Italian corpora. The advantage of relying on full parsing over using a traditional window method (which ignores the s...

متن کامل

Collocation Translation Acquisition Using Monolingual Corpora

Collocation translation is important for machine translation and many other NLP tasks. Unlike previous methods using bilingual parallel corpora, this paper presents a new method for acquiring collocation translations by making use of monolingual corpora and linguistic knowledge. First, dependency triples are extracted from Chinese and English corpora with dependency parsers. Then, a dependency ...

متن کامل

Synonymous Collocation Extraction Using Translation Information

Automatically acquiring synonymous collocation pairs such as and from corpora is a challenging task. For this task, we can, in general, have a large monolingual corpus and/or a very limited bilingual corpus. Methods that use monolingual corpora alone or use bilingual corpora alone are apparently inadequate because of low precision or low coverage. I...

متن کامل

Acquisition of Phraseological Units from Linguistically Interpreted Corpora a Case Study on German Pp-verb Collocations

In this paper, we show that accessibility of syntactic information eases collocation extraction from corpora, and supports identi cation of lexical and structural restrictions related to collocations. For collocation identi cation we use a corpus that is automatically annotated applying a part-of-speech tagger and a phrase chunker.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002